I have a similar issue but reversed. If I set the volume anywhere between 0 and 60% in the ES Sound menu, there is no audio when the video snaps are playing. If I set the volume anywhere between 60 and 100%, the volume grows proportionally when the video snaps are playing. However, the correct volume is used while games are playing.
Therefore, if I set the volume high enough to hear the video snaps, my in-game volume is often too high.
All of this started when I began to troubleshoot why some video snaps where causing ES to crash. Using the HW Accelerated OMX player fixed the issue and changing a few other ES Sound menu options finally made the Sound adjustment work. I believe the key fix was changing "PCM" to "Speaker" so that it would match what is shown under the volume control when running 'alsamixer' via SSH. After they matched, the volume setting worked and was retained instead of reverting back to 0.
Any idea why I have a mismatch between video snap volume and in-game volume?
Video Snap >>> 60 to 100% = 0 to 100%
In Game >>> 0 to 100% = 0 to 100%